Measuring Constraint-Set Utility for Partitional Clustering Algorithms

نویسندگان

  • Ian Davidson
  • Kiri Wagstaff
  • Sugato Basu
چکیده

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance, with respect to the true data labels. However, in most of these experiments, results are averaged over different randomly chosen constraint sets, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Agglomerative and Partitional Document Clustering Algorithms

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of do...

متن کامل

Limitations of Using Constraint Set Utility in Semi-Supervised Clustering

Semi-supervised clustering algorithms allow the user to incorporate background knowledge into the clustering process. Often, this background knowledge is specified in the form of must-link (ML) and cannot-link (CL) constraints, indicating whether certain pairs of elements should be in the same cluster or not. Several traditional clustering algorithms have been adapted to operate in this setting...

متن کامل

A Particle Swarm Optimization based fuzzy c means approach for efficient web document clustering

There is a need to organize a large set of documents into categories through clustering so as to facilitate searching and finding the relevant information on the web with large number of documents becomes easier and quicker. Hence we need more efficient clustering algorithms for organizing documents. Clustering on large text dataset can be effectively done using partitional clustering algorithm...

متن کامل

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to det...

متن کامل

Differential evolution and particle swarm optimisation in partitional clustering

In recent years, many partitional clustering algorithms based on genetic algorithms (GA) have been proposed to tackle the problem of finding the optimal partition of a data set. Surprisingly, very few studies considered alternative stochastic search heuristics other than GAs or simulated annealing. Two promising algorithms for numerical optimization, which are hardly known outside the heuristic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006